Skip to content

Fix memory leaks and performance improvements#22

Merged
wangyiqiu merged 4 commits intowangyiqiu:masterfrom
John-194:leak_fixes
Jun 22, 2025
Merged

Fix memory leaks and performance improvements#22
wangyiqiu merged 4 commits intowangyiqiu:masterfrom
John-194:leak_fixes

Conversation

@John-194
Copy link
Copy Markdown
Contributor

@John-194 John-194 commented Jun 17, 2025

Fixes

  1. Major leak: hasEdge duplicates some new treeT objects without removing them. Fix - preallocate them. Note - I had trouble using locks here.

    • Performance improvement: 1.16x;
    • Memory improvement: from ~6.3 GB that accumulated during my test to ~550 MB;
  2. Major leak: Not using Py_DECREF.

    • Performance: Negligible;
    • Memory improvement: from ~550 MB that accumulated during my test, a few MB;
  3. Minor leak: Multiple threads cache the same data in rangeNeighbor at once, overwriting each other. Fix - the use of Locks.

    • Performance: Negligible;
    • Memory improvement: from a few MB (that would cause problems over long-term use) to no leakage;
  4. Performance optimization: The Scheduler gets initialized every DBSCAN call. Fix: call it once.

    • Performance improvement: 1.48x on top of the first one;
    • Memory improvement: None;

For performance and memory testing, I used a point cloud of shape (12895, 2) with unique labels: [-1 0 1 2 3 4 5 6] in an n=2000 for loop.
A final test of n=60000 proved no increase in memory by Heaptrack. Total performance gain 1.72x.

Copy link
Copy Markdown
Owner

@wangyiqiu wangyiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @John-194
Thank you for looking into these issues. I generally agree with your changes, just a few questions, could you take a look?

Comment thread include/dbscan/algo.h

parallel_for(0, G->numCell(), [&](intT i) {
if (ccFlag[i]) {
trees[i] = new treeT(G->getCell(i)->getItem(), G->getCell(i)->size(), false);
Copy link
Copy Markdown
Owner

@wangyiqiu wangyiqiu Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major leak: hasEdge duplicates some new treeT objects without removing them. Fix - preallocate them. Note - I had trouble using locks here.

Thank you. In hasEdge, the trees were allocated on demand to save memory in case they are not required. How are the trees getting duplicated without removed? From concurrent calls to hasEdge?

Copy link
Copy Markdown
Contributor Author

@John-194 John-194 Jun 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what was happening was that in hasEdge

  if (!trees[n1])
    trees[n1] = new treeT(cells[n1].getItem(), cells[n1].size(), false);//todo allocation, parallel
  if (!trees[n2]) 
    trees[n2] = new treeT(cells[n2].getItem(), cells[n2].size(), false);//todo allocation, parallel

while one thread is doing new treeT, another thread sees trees[x] is empty, so it tries to do the same. Both threads assign the new treeT to the same index, one of them leaks.

This fix can be improved to work as intended, but currently it works, and speed is improved due to threads not doing duplicate work, so I moved on to fixing other areas.

Comment thread include/dbscan/grid.h
floatT hop = sqrt(dim + 3) * 1.0000001;
nbrCache[bait-cells] = tree->rangeNeighbor(bait, r * hop, fStop, fWrap, true, nbrCache[bait-cells]);
// wait for other threads to do their thing then try again
std::lock_guard<std::mutex> lock(cacheLocks[idx]);
Copy link
Copy Markdown
Owner

@wangyiqiu wangyiqiu Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor leak: Multiple threads cache the same data in rangeNeighbor at once, overwriting each other. Fix - the use of Locks.

Thank you. Did you notice any performance degradation as a result of using mutex?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not notice a change in the performance. Threads should now be doing less unnecessary work, which should overcome the performance cost of mutex.

Comment thread src/dbscanmodule.cpp
@@ -4,6 +4,27 @@
#include "dbscan/pbbs/parallel.h"


Copy link
Copy Markdown
Owner

@wangyiqiu wangyiqiu Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! I also have a fix for Arm CPU crashing, which I will upload later. There is also a very rare segmentation fault (~1 in a million chance) that I am looking into.

@wangyiqiu wangyiqiu merged commit 3bfd391 into wangyiqiu:master Jun 22, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants